Path-Based Function Embedding and its Application to Specification Mining
نویسندگان
چکیده
Identifying the relationships among program elements is useful for program understanding, debugging, and analysis. One such relationship is synonymy. Function synonyms are functions that play a similar role in code, e.g. functions that perform initialization for different device drivers, or functions that implement different symmetric-key encryption schemes. Function synonyms are not necessarily semantically equivalent and can be syntactically dissimilar; consequently, approaches for identifying code clones or functional equivalence cannot be used to identify them. This paper presents func2vec, an algorithm that maps each function to a vector in a vector space such that function synonyms are grouped together. We compute the function embedding by training a neural network on sentences generated from random walks over an encoding of the program as a labeled pushdown system (l-PDS). We demonstrate that func2vec is effective at identifying function synonyms in the Linux kernel. Furthermore, we show how function synonyms enable mining error-handling specifications with high support in Linux file systems and drivers.
منابع مشابه
On Calibration and Application of Logit-Based Stochastic Traffic Assignment Models
There is a growing recognition that discrete choice models are capable of providing a more realistic picture of route choice behavior. In particular, influential factors other than travel time that are found to affect the choice of route trigger the application of random utility models in the route choice literature. This paper focuses on path-based, logit-type stochastic route choice models, i...
متن کاملOptimizing Cost Function in Imperialist Competitive Algorithm for Path Coverage Problem in Software Testing
Search-based optimization methods have been used for software engineering activities such as software testing. In the field of software testing, search-based test data generation refers to application of meta-heuristic optimization methods to generate test data that cover the code space of a program. Automatic test data generation that can cover all the paths of software is known as a major cha...
متن کاملLink Prediction using Network Embedding based on Global Similarity
Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...
متن کاملHeterogeneous Information Network Embedding for Meta Path based Proximity
A network embedding is a representation of a large graph in a lowdimensional space, where vertices are modeled as vectors. The objective of a good embedding is to preserve the proximity (i.e., similarity) between vertices in the original graph. This way, typical search and mining methods (e.g., similarity search, kNN retrieval, classification, clustering) can be applied in the embedded space wi...
متن کاملA Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data
The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.07779 شماره
صفحات -
تاریخ انتشار 2018